On the distance concentration awareness of certain data reduction techniques

نویسنده

  • Ata Kabán
چکیده

We make a first investigation into a recently raised concern about the suitability of existing data analysis techniques when faced with the counter-intuitive properties of high dimensional data spaces, such as the phenomenon of distance concentration. Under the structural assumption of a generic linear model with a latent variable and an additive unstructured noise, we find that dimension reduction that explicitly guards against distance concentration recovers the well-known techniques of Fisher’s linear discriminant analysis, Fisher’s discriminant ratio and a variant of projection pursuit. Extrapolation to regression uncovers a close link to sure independence screening, which is a recently proposed technique for variable selection in ultra-high dimensional feature spaces. Hence, these techniques may be seen as distance concentration aware, despite they have not been explicitly designed to have this property. Throughout our analysis, other than the dependency structure implied by the mentioned linear model, we make no assumptions about the distributions of the variables involved. & 2010 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

IRDDS: Instance reduction based on Distance-based decision surface

In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...

متن کامل

Quantifying the effect of traffic on lead accumulation in soil: a case study in Iran

Road transport is a ubiquitous source of lead contamination in the soil near highways with direct and indirect impacts on human health. Accumulation of traffic-induced lead in the soils depends on gasoline lead content, traffic volume, as well as meteorological conditions. To evaluate the effect of traffic on soil lead concentration, 113 samples from the topsoil (0-15 cm) were collected in a re...

متن کامل

Frequency of Smoking and Specialized Awareness among Doctors and Nurses of Hospitals in Kerman, Iran

Background: Nicotine is one of the strongest poisons. Every year about 75 thousand of Iranians die due to smoking. Since doctors and nurses have a major role in controlling smoking, this study tried to investigate the prevalence of cigarette smoking among doctors and nurses and their awareness about the effects of smoking. Methods: This descriptive study was conducted on all doctors (n = 150) a...

متن کامل

بررسی آگاهی و عملکرد مردم شهرهای گرگان، گنبد و علی‌آباد کتول در مورد مدیریت مواد زاید جامد شهری

Background and purpose: Lack of awareness of people about the future environmental risks posed by wastes is the most important problem of this sector. Awareness, and then cultural education in this area, can be a breakthrough to this significant environmental problem. This study aimed to assess the degree of awareness of the people of the cities of Gorgan, Gonbad, and Aliabad Katool (Iran) in r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2011